4 research outputs found
Randomized Strong Recursive Skeletonization: Simultaneous compression and factorization of -matrices in the Black-Box Setting
The hierarchical matrix (-matrix) formalism provides a way
to reinterpret the Fast Multipole Method and related fast summation schemes in
linear algebraic terms. The idea is to tessellate a matrix into blocks in such
as way that each block is either small or of numerically low rank; this enables
the storage of the matrix and the application of it to a vector in linear or
close to linear complexity. A key motivation for the reformulation is to extend
the range of dense matrices that can be represented. Additionally,
-matrices in principle also extend the range of operations
that can be executed to include matrix inversion and factorization. While such
algorithms can be highly efficient for certain specialized formats (such as
HBS/HSS matrices based on ``weak admissibility''), inversion algorithms for
general -matrices tend to be based on nested recursions and
recompressions, making them challenging to implement efficiently. An exception
is the \textit{strong recursive skeletonization (SRS)} algorithm by Minden, Ho,
Damle, and Ying, which involves a simpler algorithmic flow. However, SRS
greatly increases the number of blocks of the matrix that need to be stored
explicitly, leading to high memory requirements. This manuscript presents the
\textit{randomized strong recursive skeletonization (RSRS)} algorithm, which is
a reformulation of SRS that incorporates the randomized SVD (RSVD) to
simultaneously compress and factorize an -matrix. RSRS is a
``black box'' algorithm that interacts with the matrix to be compressed only
via its action on vectors; this extends the range of the SRS algorithm (which
relied on the ``proxy source'' compression technique) to include dense matrices
that arise in sparse direct solvers
SlabLU: A Two-Level Sparse Direct Solver for Elliptic PDEs
The paper describes a sparse direct solver for the linear systems that arise
from the discretization of an elliptic PDE on a two dimensional domain. The
solver is designed to reduce communication costs and perform well on GPUs; it
uses a two-level framework, which is easier to implement and optimize than
traditional multi-frontal schemes based on hierarchical nested dissection
orderings. The scheme decomposes the domain into thin subdomains, or "slabs".
Within each slab, a local factorization is executed that exploits the geometry
of the local domain. A global factorization is then obtained through the LU
factorization of a block-tridiagonal reduced coefficient matrix. The solver has
complexity for the factorization step, and for each
solve once the factorization is completed.
The solver described is compatible with a range of different local
discretizations, and numerical experiments demonstrate its performance for
regular discretizations of rectangular and curved geometries. The technique
becomes particularly efficient when combined with very high-order convergent
multi-domain spectral collocation schemes. With this discretization, a
Helmholtz problem on a domain of size (for
which N=100 \mbox{M}) is solved in 15 minutes to 6 correct digits on a
high-powered desktop with GPU acceleration
Parallel Optimizations for the Hierarchical Poincar\'e-Steklov Scheme (HPS)
Parallel optimizations for the 2D Hierarchical Poincar\'e-Steklov (HPS)
discretization scheme are described. HPS is a multi-domain spectral collocation
scheme that allows for combining very high order discretizations with direct
solvers, making the discretization powerful in resolving highly oscillatory
solutions to high accuracy. HPS can be viewed as a domain decomposition scheme
where the domains are connected directly through the use of a sparse direct
solver. This manuscript describes optimizations of HPS that are simple to
implement, and that leverage batched linear algebra on modern hybrid
architectures to improve the practical speed of the solver. In particular, the
manuscript demonstrates that the traditionally high cost of performing local
static condensation for discretizations involving very high local order can
be reduced dramatically
SkelFMM: A Simplified Fast Multipole Method Based on Recursive Skeletonization
This work introduces the kernel-independent multi-level algorithm "skelFMM"
for evaluating all pairwise interactions between points connected through a
kernel such as the fundamental solution of the Laplace or the Helmholtz
equations. The method is based on linear algebraic tools such as randomized low
rank approximation and "skeleton representations" of far-field interactions.
The work is related to previously proposed linear algebraic reformulations of
the fast multipole method (FMM), but is distinguished by relying on simpler
data structures. In particular, skelFMM does not require an "interaction list",
as it relies instead on algebraically-modified kernel interactions between
near-neighbors at every level. Like other kernel independent algorithms, it
only requires evaluation of the kernel function, allowing the methodology to
easily be extended to a range of different kernels in 2D and 3D. The simplicity
of the algorithm makes it particularly amenable to parallel implementation on
heterogeneous hardware architectures.
The performance of the algorithm is demonstrated through numerical
experiments conducted on uniform and non-uniform point distributions in 2D and
3D, involving Laplace and (low frequency) Helmholtz kernels. The algorithm
relies on a precomputation stage that constructs a tailored representation for
a given geometry of points. Once the precomputation has completed, the
matrix-vector multiplication attains high speed through GPU acceleration that
leverages batched linear algebra